ARROW-1760: [Java] Add Apache Mnemonic (incubating) as alternative backed allocator #36

bigdata-memory · 2016-03-24T17:26:05Z

…note that move allocator services to the service-dist folder as the properties indicated in pom.xml.

julienledem · 2016-08-08T23:25:21Z

java/memory/src/main/java/io/netty/buffer/PooledByteBufAllocatorL.java

@@ -57,6 +59,17 @@ public PooledByteBufAllocatorL(MetricRegistry registry) {
    empty = new UnsafeDirectLittleEndian(new DuplicatedByteBuf(Unpooled.EMPTY_BUFFER));
  }

+  public static void setUpMnemonicUnpooledByteBufAllocator(MnemonicUnpooledByteBufAllocator<?> mubballocator) {


Instead of statically setting an allocator, it should be an optional constructor parameter of type ByteBufAllocator.

Got it, let me try to make it optional, Thanks.

found the field INNER_ALLOCATOR of AllocationManager is initialized with "new PooledByteBufAllocatorL()" in
https://github.com/apache/arrow/blob/master/java/memory/src/main/java/org/apache/arrow/memory/AllocationManager.java#L68
so should I add an optional constructor for AllocationManager as well?

The bufferWithoutReservation(..) of BaseAllocator, in turn, instantiates and use the AllocationManager,

arrow/java/memory/src/main/java/org/apache/arrow/memory/BaseAllocator.java

Line 309 in 2706b7f

final AllocationManager manager = new AllocationManager(this, size);

so is the method the best place to inject Mnemonic's allocator as its parameter?
Thanks!

Author: Kouhei Sutou <kou@clear-code.com> Closes apache#968 from kou/add-new-committers and squashes the following commits: 710558b [Kouhei Sutou] [Website] Add new committers

…release blog post Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#967 from wesm/ARROW-1353 and squashes the following commits: 804fe35 [Wes McKinney] Escape underscores in CHANGELOG.md 1b7c4b6 [Wes McKinney] Finish 0.6.0 blog post a78cb94 [Wes McKinney] Some updates for 0.6.0 site update

Closes apache#970 Change-Id: I49ea3f7f99d080c517fb21b86b7a27e17b04e20b

…rite_table Closes apache#971 Change-Id: I7c689b200a4f04af51928f6765362fef52c613e8

…mple. Update API doc site build instructions Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#973 from wesm/site-doc-updates and squashes the following commits: 8884b4a4 [Wes McKinney] Remove outdated pyarrow.jemalloc_memory_pool example. Add --with-plasma to Python doc build

Make Arrow buildable with jdk9: - upgrade checkstyle plugin to 6.19 - upgrade assembly plugin to 3.0.0 - update jmockit version to 1.33 Also add travis entry to build using Oracle JDK9 EA Author: Laurent Goujon <laurent@dremio.com> Closes apache#966 from laurentgo/laurent/jdk-9 and squashes the following commits: d009d01 [Laurent Goujon] Make mvn site optional since not working yet with jdk9 b3e5822 [Laurent Goujon] Update plugin version according to Maven team recommendations d62d409 [Laurent Goujon] Fix travis id for jdk9 92fe6d4 [Laurent Goujon] Make Arrow buildable with jdk9

@jacques-n

cc @jacques-n , @StevenMPhillips Patch Summary: As part of ARROW-801, we recently added getValidityBufferAddress(), getOffsetBufferAddress(), getDataBufferAddress() interfaces to get the virtual address of the ArrowBuf. We now have the following new interfaces to get the corresponding ArrowBuf: getValidityBuffer() getDataBuffer() getOffsetBuffer() Background: Currently we have getBuffer() method implemented as part of BaseDataValueVector abstract class. As part of patch for ARROW-276, NullableValueVectors no longer extends BaseDataValueVector -- they don't have to since they don't need the underlying data buffer (ArrowBuf data field) of BaseDataValueVector. The call to getBuffer() on NullableValueVectors simply delegates the operation to getBuffer() of underlying data/value vector. Problem: If a piece of code is working with ValueVector abstraction and the expected runtime type is Nullable<something>Vector, the compiler obviously complains about doing (v of type ValueVector).getBuffer(). Until now this worked as we kept the compiler happy by casting the ValueVector to BaseDataValueVector and then do ((BaseDataValueVector)(v of type ValueVector)).getBuffer(). This code broke since NullableValueVectors are no longer a subtype of BaseDataValueVector -- the inheritance hierarchy was changed as part of ARROW-276. Solution: Similar to what was done in ARROW-801, we have new methods at ValueVector interface to get the underlying buffer. ValueVector has always had the methods getBuffers(), getBufferSizeFor(), getBufferSize(), so it makes sense to augment the ValueVector interface with new APIs. It looks like new unit tests are not needed since the unit tests added for ARROW-801 test the new APIs as well --> getDataBufferAddress() underneath invokes getDataBuffer() to get the memory address of ArrowBuf so we are good. Author: siddharth <siddharth@dremio.com> Closes apache#976 from siddharthteotia/ARROW-1373 and squashes the following commits: 1ef2022 [siddharth] Fixed failures and added javadocs e5ff023 [siddharth] ARROW-1373: Implement getBuffer() methods for ValueVector

Closes apache#977 Change-Id: I494db4952036a8e52078f1d698d003904f91a34f

The method for starting the Plasma store is already documented in https://arrow.apache.org/docs/python/plasma.html. So far it only worked if the store was installed with "make install" from the C++ sources. This makes it also possible to start it if the pyarrow wheels are installed. Author: Philipp Moritz <pcmoritz@gmail.com> Closes apache#975 from pcmoritz/plasma-store-ep and squashes the following commits: eddc487 [Philipp Moritz] make plasma store entry point private 4c05140 [Philipp Moritz] define entry point for the plasma store

…a put performance This PR makes it possible to use Plasma object store backed by a pre-mounted hugetlbfs. Author: Philipp Moritz <pcmoritz@gmail.com> Author: Alexey Tumanov <atumanov@gmail.com> Closes apache#974 from atumanov/putperf and squashes the following commits: 077b78f [Philipp Moritz] add more comments 5aa4b0d [Philipp Moritz] preflight script formatting changes 22188a6 [Philipp Moritz] formatting ffb9916 [Philipp Moritz] address comments 225429b [Philipp Moritz] update documentation with Alexey's fix 713a0c4 [Philipp Moritz] add missing includes 4c976bb [Philipp Moritz] make format fb8e1b4 [Philipp Moritz] add helpful error message 7260d59 [Philipp Moritz] expose number of threads to python and try out cleanups 98b603e [Alexey Tumanov] map_populate on linux; fall back to mlock/memset otherwise ce90ef4 [Alexey Tumanov] documenting new plasma store info fields c52f211 [Philipp Moritz] cleanups (TODO: See if memory locking helps) 4702703 [Philipp Moritz] preliminary documentation 3073a99 [Alexey Tumanov] reenable hashing a20ca56 [Alexey Tumanov] fix bug dd04b87 [Alexey Tumanov] [arrow][putperf] enable HUGETLBFS support on linux

…he Arrow This PR adds the capability to serialize a large class of (nested) Python objects in Apache Arrow. The eventual goal is to evolve this into a more modern version of pickle that will make it possible to read the data from other languages supported by Apache Arrow (and might also be faster). Currently we support lists, tuples, dicts, strings, numpy objects, Python classes and namedtuples. A fallback to (cloud-)pickle can be provided for objects that cannot be natively represented in Arrow (for example lambdas). Numpy data within objects is efficiently represented using Arrow's Tensor facilities and for the nested Python sequences we use Arrow's UnionArray. There are many loose ends that will need to be addressed in follow up PRs. Author: Philipp Moritz <pcmoritz@gmail.com> Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#965 from pcmoritz/python-serialization and squashes the following commits: 31486ed [Wes McKinney] Fix typo 2164db7 [Wes McKinney] Add SerializedPyObject to public API b70235c [Wes McKinney] Add pyarrow.deserialize convenience method a6a402e [Wes McKinney] Memory map fixture robustness on Windows 114a5fb [Wes McKinney] Add a Python container for the SerializedPyObject data, total_bytes method 8e59617 [Wes McKinney] Use pytest tmpdir for large memory map fixture so works on Windows 8a42f30 [Wes McKinney] Add doxygen comment to set_serialization_callbacks a9522c5 [Wes McKinney] Refactoring, address code review comments. fix flake8 issues ce5784d [Wes McKinney] Do not use ARROW_CHECK in production code. Consolidate python_to_arrow code c8efef9 [Wes McKinney] Fix various Clang compiler warnings due to integer conversions. clang-format 831e2f2 [Philipp Moritz] remove sequence.h 54af39b [Philipp Moritz] more fixes a6fdb76 [Philipp Moritz] make tests work fe56c73 [Philipp Moritz] fixes 84d62f6 [Philipp Moritz] more fixes 49aba8a [Philipp Moritz] make it compile on windows aa1f300 [Philipp Moritz] linting 95cb9da [Philipp Moritz] fix GIL adcc8f7 [Philipp Moritz] shuffle stuff around bcebdfe [Philipp Moritz] fix longlong vs int64 and unsigned variant 4cc45cd [Philipp Moritz] cleanup f25f3f3 [Philipp Moritz] cleanups a88d410 [Philipp Moritz] convert DESERIALIZE_SEQUENCE back to a macro c425978 [Philipp Moritz] prevent possible memory leaks aeafd82 [Philipp Moritz] fix callbacks 389bfc6 [Philipp Moritz] documentation 2f0760c [Philipp Moritz] fix api faf9a3e [Philipp Moritz] make exported API more consistent e1fc0c5 [Philipp Moritz] restructure c1f377b [Philipp Moritz] more fixes 3e94e6d [Philipp Moritz] clang-format 99e2d1a [Philipp Moritz] cleanups 3298329 [Philipp Moritz] mutable refs and small fixes e73c1ea [Philipp Moritz] make DictBuilder private 3929273 [Philipp Moritz] increase Py_True refcount and hide helper methods aaf6f09 [Philipp Moritz] remove code duplication c38c58d [Philipp Moritz] get rid of leaks and clarify reference counting for dicts 74b9e46 [Philipp Moritz] convert DESERIALIZE_SEQUENCE to a template 080db03 [Philipp Moritz] fix first few comments a6105d2 [Philipp Moritz] lint fix 802e739 [Philipp Moritz] clang-format 2e08de4 [Philipp Moritz] fix namespaces 91b57d5 [Philipp Moritz] fix linting c4782ac [Philipp Moritz] fix 7069e20 [Philipp Moritz] fix imports 2171761 [Philipp Moritz] fix python unicode string 30bb960 [Philipp Moritz] rebase f229d8d [Philipp Moritz] serialization of custom objects 8b2ffe6 [Philipp Moritz] working version bd36c83 [Philipp Moritz] handle very long longs with custom serialization callback 49a4acb [Philipp Moritz] roundtrip working for the first time 44fb98b [Philipp Moritz] work in progress 3af1c67 [Philipp Moritz] deserialization path (need to figure out if base object and refcounting is handled correctly) deb3b46 [Philipp Moritz] rename serialization entry point 5766b8c [Philipp Moritz] python to arrow serialization

… back to pandas form Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#979 from wesm/ARROW-1357 and squashes the following commits: 8318a12 [Wes McKinney] Use PyLong_FromLongLong so Windows is happy 18acdd9 [Wes McKinney] Account for chunked arrays when converting lists back to pandas form

Author: Max Risuhin <risuhin.max@gmail.com> Closes apache#980 from MaxRis/ARROW-1375 and squashes the following commits: f5e4156 [Max Risuhin] ARROW-1375: [C++] Remove dependency on msvc version for Snappy build

…UDA tests This is an optional leaf library for users who want to use Arrow data on graphics cards. See parent JIRA ARROW-1055 for a roadmap for some basic GPU extensions Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#982 from wesm/arrow-gpu-lib and squashes the following commits: f8c00eb [Wes McKinney] Remove cruft from CMakeLists.txt e8f04a8 [Wes McKinney] Set up libarrow_gpu, add simple unit test that allocates memory on device Change-Id: Ia1851ea6f30cb7cf3de422779d2d029e4ded672f

Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#983 from wesm/ARROW-1395 and squashes the following commits: c105a21 [Wes McKinney] Remove deprecated APIs from <= 0.4.0

…atch as an IPC message to a new buffer There's also a little bit of API scrubbing as I went through this code. Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#984 from wesm/ARROW-1384 and squashes the following commits: a3996fe [Wes McKinney] Add DCHECK to catch unequal schemas 2952cfb [Wes McKinney] Add SerializeRecordBatch API, various API scrubbing, make some integer arguments const

This makes it easy to write from host to device and read from device to host. We also need a zero-copy device reader for IPC purposes (where we don't want to move any data to the host), can do that in a subsequent patch. Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#985 from wesm/ARROW-1392 and squashes the following commits: ae24cb5 [Wes McKinney] Add section to C++ README about building libarrow_gpu 229a268 [Wes McKinney] Refactor CudaBufferReader to return zero-copy device pointers. Add unit tests 415157a [Wes McKinney] Make Tell overrides in arrow-glib const 5daa59e [Wes McKinney] Add cuda-benchmark module 1cf1196 [Wes McKinney] Test CudaBuffer::CopyFromHost a2708f2 [Wes McKinney] Implement IO interfaces for CUDA buffers

Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#989 from wesm/ARROW-1386 and squashes the following commits: be3b53a [Wes McKinney] Unpin CMake version in MSVC toolchain builds now that 3.9.1 is in conda-forge

…f sign bit * Reimplement Decimal128 types to use the Int128 type as the underlying integer representation, adapted from the Apache ORC project's C++ in memory format. This enables us to write integration tests and results in an in-memory Decimal128 format that is compatible with the Java implementation * Additionaly, this PR also fixes Decimal slice comparison and adds related regression tests * Follow-ups include ARROW-695 (C++ Decimal integration tests), ARROW-696 (JSON read/write support for decimals) and ARROW-1238 (Java Decimal integration tests). Author: Phillip Cloud <cpcloud@gmail.com> Closes apache#981 from cpcloud/decimal-rewrite and squashes the following commits: 53ce04b [Phillip Cloud] Formatting fe13ef3 [Phillip Cloud] Remove redundant constructor 86db184 [Phillip Cloud] Subclass from FixedSizeBinaryArray for code reuse 535f9ff [Phillip Cloud] Use a macro for cases 1cc43ce [Phillip Cloud] Use CHAR_BIT 355fb24 [Phillip Cloud] Include the correct header for _BitScanReverse b53d7cd [Phillip Cloud] Share comparison code 162eeeb [Phillip Cloud] BUG: Double export b98c894 [Phillip Cloud] BUG: Export symbols be220c8 [Phillip Cloud] Cast so we have enough space to contain the integer 5716010 [Phillip Cloud] Cast 18 to matching type size_t for msvc 8833904 [Phillip Cloud] Remove unnecessary args to sto* function calls 628ce85 [Phillip Cloud] Fix more docs e4a1792 [Phillip Cloud] More const 8ecb315 [Phillip Cloud] Formatting 178d3f2 [Phillip Cloud] NOLINT for MSVC specific and necessary types 38c9b50 [Phillip Cloud] Fix doc style in int128.h and add const where possible 2930d7b [Phillip Cloud] Fix naming convention in decimal-test.cc 1eab5c4 [Phillip Cloud] Remove unnecessary header from CMakeLists.txt 22eda4b [Phillip Cloud] kMaximumPrecision 9af97d8 [Phillip Cloud] MSVC fix 349dc58 [Phillip Cloud] ARROW-786: [Format] In-memory format for 128-bit Decimals, handling of sign bit

…chema, ReadSchema public APIs This is mostly moving code around. In reviewing I recommend focusing on the public headers. There were a number of places where it is more consistent to use naked pointers versus shared_ptr. Also some constructors were returning shared_ptr to subclass, where it would be simpler for clients to return a pointer to base. This includes ARROW-1376 and ARROW-1406 Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#988 from wesm/ARROW-1408 and squashes the following commits: b156767 [Wes McKinney] Fix up glib bindings, undeprecate some APIs 4bdebfa [Wes McKinney] Add serialize methods to RecordBatch, Schema. Test round trip ef12e0f [Wes McKinney] Fix a valgrind warning 73d30c9 [Wes McKinney] Better comments 8597b96 [Wes McKinney] Remove API that was never intended to be public, unlikely to be used anywhere 122a759 [Wes McKinney] Refactoring sweep and cleanup of public IPC API. Move non-public APIs from metadata.h to metadata-internal.h and create message.h, dictionary.h b646f96 [Wes McKinney] Set device in more places

@pcmoritz

…ore. cc @pcmoritz @atumanov Author: Robert Nishihara <robertnishihara@gmail.com> Closes apache#992 from robertnishihara/removemappopulate and squashes the following commits: 8ed9612 [Robert Nishihara] Remove unnecessary ifdef. 7b75bd9 [Robert Nishihara] Remove MAP_POPULATE flag when mmapping files in Plasma store.

…rted dependency errors There is couple of dependency issues in the current maven config. This is then leaking into the integrating project which then needs to specify foreign dependencies just because arrow doesn't list them properly or is pulling unnecessary dependencies just because arrow lists them improperly. * ```arrow-format``` ``` [WARNING] Unused declared dependencies found: [WARNING] org.slf4j:slf4j-api:jar:1.7.25:compile [WARNING] com.vlkan:flatbuffers:jar:1.2.0-3f79e055:compile [WARNING] io.netty:netty-handler:jar:4.0.49.Final:compile [WARNING] com.google.guava:guava:jar:18.0:compile ``` * ```arrow-memory``` ``` [WARNING] Used undeclared dependencies found: [WARNING] io.netty:netty-buffer:jar:4.0.49.Final:compile [WARNING] io.netty:netty-common:jar:4.0.49.Final:compile [WARNING] Unused declared dependencies found: [WARNING] com.carrotsearch:hppc:jar:0.7.2:compile [WARNING] io.netty:netty-handler:jar:4.0.49.Final:compile ``` * ```arrow-tools``` ``` [WARNING] Used undeclared dependencies found: [WARNING] com.fasterxml.jackson.core:jackson-databind:jar:2.7.9:compile [WARNING] com.fasterxml.jackson.core:jackson-core:jar:2.7.9:compile [WARNING] Unused declared dependencies found: [WARNING] org.apache.commons:commons-lang3:jar:3.6:compile [WARNING] org.apache.arrow:arrow-format:jar:0.7.0-SNAPSHOT:compile [WARNING] io.netty:netty-handler:jar:4.0.49.Final:compile ``` * ```arrow-vector``` ``` [WARNING] Used undeclared dependencies found: [WARNING] com.google.code.findbugs:jsr305:jar:3.0.2:compile [WARNING] com.vlkan:flatbuffers:jar:1.2.0-3f79e055:compile [WARNING] io.netty:netty-common:jar:4.0.49.Final:compile [WARNING] io.netty:netty-buffer:jar:4.0.49.Final:compile [WARNING] com.fasterxml.jackson.core:jackson-core:jar:2.7.9:compile [WARNING] Unused declared dependencies found: [WARNING] org.apache.commons:commons-lang3:jar:3.6:compile [WARNING] io.netty:netty-handler:jar:4.0.49.Final:compile ``` I am proposing this PR to: 1. Add maven-dependency-plugin to enforce all dependencies are always listed corrctly 2. Fixing all the current dependency issues Author: Antony Mayi <antonymayi@yahoo.com> Author: Stepan Kadlec <stepan.kadlec@oracle.com> Closes apache#978 from antonymayi/master and squashes the following commits: d7f081e [Antony Mayi] moving `copy-flatc` to initialize phase and `analyze` execution to parent pom ec72717 [Antony Mayi] removing unused apache.commons.lang3, fixing pom 8cbfe5f [Antony Mayi] maven-dependency-plugin: ignoring dependencies of generated sources in arrow-vector dc833bb [Stepan Kadlec] adding maven-dependency-plugin and fixing all reported dependency errors

Author: Phillip Cloud <cpcloud@gmail.com> Closes apache#993 from cpcloud/ARROW-1411 and squashes the following commits: 741269f [Phillip Cloud] ARROW-1411: [Python] Booleans in Float Columns cause Segfault

When configured, this looks like: ``` #define ARROW_CUDA_ABI_VERSION_MAJOR 8 #define ARROW_CUDA_ABI_VERSION_MINOR 0 ``` I'm not sure how to use this yet. It would be nice if we could work out how to enable thirdparty users to detect incompatibility with their nvcc at compiler time Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#990 from wesm/ARROW-1399 and squashes the following commits: 1ad6966 [Wes McKinney] Add CUDA build version defines in public headers

Apache Arrow C++ uses int as result type for expression that uses size_t. It causes sign-conversion warning but the coding style is expected. Example: .../arrow/buffer.h:296:41: warning: implicit conversion changes signedness: 'unsigned long' to 'int64_t' (aka 'long') [-Wsign-conversion] int64_t length() const { return size_ / sizeof(T); } ~~~~~~ ~~~~~~^~~~~~~~~~~ Author: Kouhei Sutou <kou@clear-code.com> Closes apache#999 from kou/glib-suppress-warning-on-clang and squashes the following commits: 397490e [Kouhei Sutou] [GLib] Suppress sign-conversion warnings

This PR slightly reduces ambiguity in the array example for null bitmaps. The original example was left/right symmetric; this PR changes the example to break that symmetry. Asymmetry is important since readers who skip the byte endianness section could have interpreted the bitmap buffer in two distinct ways: left-to-right with an offset of 3 (wrong), or right-to-left with zero offset (correct). Author: Fritz Obermeyer <fritz.obermeyer@gmail.com> Closes apache#998 from fritzo/patch-1 and squashes the following commits: af3dcbd [Fritz Obermeyer] Clarify memory layout documentation

Author: Kouhei Sutou <kou@clear-code.com> Closes apache#996 from kou/glib-cast-after-status-check and squashes the following commits: 02b59db [Kouhei Sutou] [GLib] Cast after status check

…hon objects Note that this PR breaks the PlasmaClient API (which is still unstable at this point, so this is acceptable). It renames PlasmaClient.get to PlasmaClient.get_buffers and introduces two new functions, PlasmaClient.put and PlasmaClient.get which can put Python objects into the object store and provide access to their content. The old get was renamed to get_buffers because most users will want to use the new get method and therefore it should have the more concise name. There is some freedom in designing the API; I tried to make it so there is a unified API between getting one and multiple objects (the latter is supported to limit the number of IPC roundtrips with the plasma store when we get many small objects). I also introduced a special object that is returned if one of the objects was not available within the timeout. We could use "None" here, but then it would be hard to distinguish between getting a "None" object and a timeout. Author: Philipp Moritz <pcmoritz@gmail.com> Closes apache#995 from pcmoritz/plasma-putget and squashes the following commits: bd24e01 [Philipp Moritz] add documentation e60ea73 [Philipp Moritz] get_buffer -> get_buffers and update example 8c36903 [Philipp Moritz] support full API 5921148 [Philipp Moritz] move put and get into PlasmaClient cf4bf24 [Philipp Moritz] add type information 0049c67 [Philipp Moritz] fix flake8 linting 44c3b3d [Philipp Moritz] fixes 20b119e [Philipp Moritz] make it possible to get single objects 36f67d6 [Philipp Moritz] implement ObjectID.from_random c044954 [Philipp Moritz] add documentation eb9694a [Philipp Moritz] implement timeouts 3518c71 [Philipp Moritz] fix e1924a4 [Philipp Moritz] add put and get 44ada47 [Philipp Moritz] export symbols

…o GPU device memory This additionally does a few things: * Change libarrow_gpu to use CUDA driver API instead of runtime API * Adds code for exporting buffers using CUDA IPC on Linux, but this is not yet tested Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1000 from wesm/ARROW-1364 and squashes the following commits: e436755 [Wes McKinney] Add newline at end of file a8812af [Wes McKinney] Complete basic IPC message and record batch reads on GPU device memory 16d628f [Wes McKinney] More Arrow IPC scaffolding 591aceb [Wes McKinney] Draft SerializeRecordBatch for CUDA 84e4525 [Wes McKinney] Add classes and methods for simplifying use of CUDA IPC machinery. No tests yet 508febb [Wes McKinney] Test suite passing again f3c724e [Wes McKinney] Get things compiling / linking using driver API 5d686fe [Wes McKinney] More progress 2840c60 [Wes McKinney] Progress 3a37fdf [Wes McKinney] Start cuda context class 03d0baf [Wes McKinney] Start cuda_ipc file

Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1238 from wesm/ARROW-1654 and squashes the following commits: 2e6f9e3 [Wes McKinney] Add pickling test cases for timestamp, decimal 1827b23 [Wes McKinney] Fix pickling on py27, implement for Schema. Also pickle field/schema metadata 1395583 [Wes McKinney] Implement pickling for list, struct, add __richcmp__ for Field 366f428 [Wes McKinney] Start implementing pickling for DataType, Field

wesm · 2017-10-23T22:29:08Z

It seems like this is still an interesting optional extension. @bigdata-memory are you interested in rebasing this and making this an optional extension (arrow-mnemonic)?

bigdata-memory · 2017-10-23T23:09:55Z

@wesm sure, I will do it, Thanks.

This closes [ARROW-1720](https://issues.apache.org/jira/browse/ARROW-1720). Author: Licht-T <licht-t@outlook.jp> Closes apache#1243 from Licht-T/fix-unbound-chunk and squashes the following commits: cabdd43 [Licht-T] TST: Add bounds check tests for chunk getter bda7f4c [Licht-T] BUG: Implement bounds check in chunk getter

This got messed up during one of the patches in which these files were refactored. Once the build fails, I will fix the lint errors Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1242 from wesm/ARROW-1711 and squashes the following commits: cd4b655 [Wes McKinney] Fix more flake8 warnings 2eb8bf4 [Wes McKinney] Fix flake8 issues cef7a7c [Wes McKinney] Fix flake8 calls to lint the right directories

@TobyShaw

… to avoid using nullptr in public headers cc @TobyShaw. Can you test this? Close apache#1098 Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1228 from wesm/ARROW-1134 and squashes the following commits: bf18158 [Wes McKinney] Only define NULLPTR if not already defined a51dd88 [Wes McKinney] Add NULLPTR macro to avoid using nullptr in public headers for C++/CLI users

Author: Phillip Cloud <cpcloud@gmail.com> Closes apache#1211 from cpcloud/ARROW-1588 and squashes the following commits: ae0d562 [Phillip Cloud] ARROW-1588: [C++/Format] Harden Decimal Format

Author: Kouhei Sutou <kou@clear-code.com> Closes apache#1247 from kou/c-glib-release-verify and squashes the following commits: e9f2307 [Kouhei Sutou] [GLib] Add setup description to verify C GLib build

bigdata-memory · 2017-10-25T22:24:46Z

Hi please take a look this PR, it can compile without tests but reports many following errors when running tests, please help, Thanks!
"java.lang.NoClassDefFoundError: Could not initialize class org.apache.arrow.memory.RootAllocator"

bigdata-memory · 2017-10-25T23:54:50Z

Found the mnemonic-pmalloc-service must be separated from dependency.

…note that move allocator services to the service-dist folder as the properties indicated in pom.xml.

bigdata-memory · 2017-10-26T04:33:36Z

Pass compile and tests, please review, Thanks!

wesm · 2017-11-01T02:42:14Z

@jacques-n @siddharthteotia could someone from the Java side take a look at this? So long as it does not conflict with normal users of Arrow, giving the option to experiment with non-volatile memory to users seems like a reasonable idea. I'm not personally qualified to review the Java code

jacques-n · 2017-11-02T15:30:18Z

I think we should look at doing this in a cleaner way. Having setters on static interface seems like a bit of hack. I also think it probably makes sense to expose a location property (or similar) as well as an ability to move memory between domains. A good way might be to have an optional constructor for RootAllocator with a new interface. The default could have a wrapped version of the existing static pooled udle allocator.

The allocator capacity should also be tied to the subsystem. Right now I think we're constrained by directory memory capacity of the JVM but that may not be true in the case that we're using the other allocator.

Also, any idea on the performance using the alternative allocator. Does Mnemonic have it's own intelligent allocator? The normal path uses a nice allocator to manage various size chunks. The model presented here operates above that allocator (it seems like maybe it should be below the netty allocator and used for chunk allocations rather than final allocations) and thus I wonder how smaller allocations would work (I don't know Mnemonic well).

It seems like people should be able to inspect as well change the memory tier that a ArrowBuf can be located in. For example, move a buffer between memory, nvme and disk. Thoughts? In this case we need think more about how we manage allocation tracking. You'd potentially want have constraints and/or reservations per domain.

I also would prefer not making mnemonic a required dependency. Seems like we should look at how we can make it optional. If we do something more interface based at the RootAllocator level, this should be possible.

bigdata-memory · 2017-11-03T20:38:24Z

The Mnemonic has three allocators i.e. VolatileMemAllocator, NonVolatileMemAllocator and SysMemAllocator, all of them rely on qualified memory services. those allocators abstract fundamental interface operations for DOM and DCM, Regarding how smaller allocations would work, that would totally depend on the implementation of specific memory service. the action e.g. move a buffer between memory domains might be handled by Mnemonic directly later or Arrow itself, I think this kind of action could be pretty straightforward because there may be no customizable links between ArrowBufs.

bigdata-memory · 2017-11-03T20:42:01Z

Regarding optional dependency, I think we need to design a well-defined mechanism to make it possible. Mnemonic has provided one and will define a schema to make this more flexible.

wesm · 2018-11-04T18:12:02Z

Closing this as stale for now

fix SendCreateRequest, miss a parameter

This PR enables tests for `ARROW_COMPUTE`, `ARROW_DATASET`, `ARROW_FILESYSTEM`, `ARROW_HDFS`, `ARROW_ORC`, and `ARROW_IPC` (default on). #7131 enabled a minimal set of tests as a starting point. I confirmed that these tests pass locally with the current master. In the current TravisCI environment, we cannot see this result due to a lot of error messages in `arrow-utility-test`. ``` $ git log | head -1 commit ed5f534 % ctest ... Start 1: arrow-array-test 1/51 Test #1: arrow-array-test ..................... Passed 4.62 sec Start 2: arrow-buffer-test 2/51 Test #2: arrow-buffer-test .................... Passed 0.14 sec Start 3: arrow-extension-type-test 3/51 Test #3: arrow-extension-type-test ............ Passed 0.12 sec Start 4: arrow-misc-test 4/51 Test #4: arrow-misc-test ...................... Passed 0.14 sec Start 5: arrow-public-api-test 5/51 Test #5: arrow-public-api-test ................ Passed 0.12 sec Start 6: arrow-scalar-test 6/51 Test #6: arrow-scalar-test .................... Passed 0.13 sec Start 7: arrow-type-test 7/51 Test #7: arrow-type-test ...................... Passed 0.14 sec Start 8: arrow-table-test 8/51 Test #8: arrow-table-test ..................... Passed 0.13 sec Start 9: arrow-tensor-test 9/51 Test #9: arrow-tensor-test .................... Passed 0.13 sec Start 10: arrow-sparse-tensor-test 10/51 Test #10: arrow-sparse-tensor-test ............. Passed 0.16 sec Start 11: arrow-stl-test 11/51 Test #11: arrow-stl-test ....................... Passed 0.12 sec Start 12: arrow-concatenate-test 12/51 Test #12: arrow-concatenate-test ............... Passed 0.53 sec Start 13: arrow-diff-test 13/51 Test #13: arrow-diff-test ...................... Passed 1.45 sec Start 14: arrow-c-bridge-test 14/51 Test #14: arrow-c-bridge-test .................. Passed 0.18 sec Start 15: arrow-io-buffered-test 15/51 Test #15: arrow-io-buffered-test ............... Passed 0.20 sec Start 16: arrow-io-compressed-test 16/51 Test #16: arrow-io-compressed-test ............. Passed 3.48 sec Start 17: arrow-io-file-test 17/51 Test #17: arrow-io-file-test ................... Passed 0.74 sec Start 18: arrow-io-hdfs-test 18/51 Test #18: arrow-io-hdfs-test ................... Passed 0.12 sec Start 19: arrow-io-memory-test 19/51 Test #19: arrow-io-memory-test ................. Passed 2.77 sec Start 20: arrow-utility-test 20/51 Test #20: arrow-utility-test ...................***Failed 5.65 sec Start 21: arrow-threading-utility-test 21/51 Test #21: arrow-threading-utility-test ......... Passed 1.34 sec Start 22: arrow-compute-compute-test 22/51 Test #22: arrow-compute-compute-test ........... Passed 0.13 sec Start 23: arrow-compute-boolean-test 23/51 Test #23: arrow-compute-boolean-test ........... Passed 0.15 sec Start 24: arrow-compute-cast-test 24/51 Test #24: arrow-compute-cast-test .............. Passed 0.22 sec Start 25: arrow-compute-hash-test 25/51 Test #25: arrow-compute-hash-test .............. Passed 2.61 sec Start 26: arrow-compute-isin-test 26/51 Test #26: arrow-compute-isin-test .............. Passed 0.81 sec Start 27: arrow-compute-match-test 27/51 Test #27: arrow-compute-match-test ............. Passed 0.40 sec Start 28: arrow-compute-sort-to-indices-test 28/51 Test #28: arrow-compute-sort-to-indices-test ... Passed 3.33 sec Start 29: arrow-compute-nth-to-indices-test 29/51 Test #29: arrow-compute-nth-to-indices-test .... Passed 1.51 sec Start 30: arrow-compute-util-internal-test 30/51 Test #30: arrow-compute-util-internal-test ..... Passed 0.13 sec Start 31: arrow-compute-add-test 31/51 Test #31: arrow-compute-add-test ............... Passed 0.12 sec Start 32: arrow-compute-aggregate-test 32/51 Test #32: arrow-compute-aggregate-test ......... Passed 14.70 sec Start 33: arrow-compute-compare-test 33/51 Test #33: arrow-compute-compare-test ........... Passed 7.96 sec Start 34: arrow-compute-take-test 34/51 Test #34: arrow-compute-take-test .............. Passed 4.80 sec Start 35: arrow-compute-filter-test 35/51 Test #35: arrow-compute-filter-test ............ Passed 8.23 sec Start 36: arrow-dataset-dataset-test 36/51 Test #36: arrow-dataset-dataset-test ........... Passed 0.25 sec Start 37: arrow-dataset-discovery-test 37/51 Test #37: arrow-dataset-discovery-test ......... Passed 0.13 sec Start 38: arrow-dataset-file-ipc-test 38/51 Test #38: arrow-dataset-file-ipc-test .......... Passed 0.21 sec Start 39: arrow-dataset-file-test 39/51 Test #39: arrow-dataset-file-test .............. Passed 0.12 sec Start 40: arrow-dataset-filter-test 40/51 Test #40: arrow-dataset-filter-test ............ Passed 0.16 sec Start 41: arrow-dataset-partition-test 41/51 Test #41: arrow-dataset-partition-test ......... Passed 0.13 sec Start 42: arrow-dataset-scanner-test 42/51 Test #42: arrow-dataset-scanner-test ........... Passed 0.20 sec Start 43: arrow-filesystem-test 43/51 Test #43: arrow-filesystem-test ................ Passed 1.62 sec Start 44: arrow-hdfs-test 44/51 Test #44: arrow-hdfs-test ...................... Passed 0.13 sec Start 45: arrow-feather-test 45/51 Test #45: arrow-feather-test ................... Passed 0.91 sec Start 46: arrow-ipc-read-write-test 46/51 Test #46: arrow-ipc-read-write-test ............ Passed 5.77 sec Start 47: arrow-ipc-json-simple-test 47/51 Test #47: arrow-ipc-json-simple-test ........... Passed 0.16 sec Start 48: arrow-ipc-json-test 48/51 Test #48: arrow-ipc-json-test .................. Passed 0.27 sec Start 49: arrow-json-integration-test 49/51 Test #49: arrow-json-integration-test .......... Passed 0.13 sec Start 50: arrow-json-test 50/51 Test #50: arrow-json-test ...................... Passed 0.26 sec Start 51: arrow-orc-adapter-test 51/51 Test #51: arrow-orc-adapter-test ............... Passed 1.92 sec 98% tests passed, 1 tests failed out of 51 Label Time Summary: arrow-tests = 27.38 sec (27 tests) arrow_compute = 45.11 sec (14 tests) arrow_dataset = 1.21 sec (7 tests) arrow_ipc = 6.20 sec (3 tests) unittest = 79.91 sec (51 tests) Total Test time (real) = 79.99 sec The following tests FAILED: 20 - arrow-utility-test (Failed) Errors while running CTest ``` Closes #7142 from kiszk/ARROW-8754 Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>

…n pushdow, and refactor some classes (apache#36)

* Add toString to Time obj in Time#toString * Improve Time toString * Fix maven plugins * Revert "Update java/flight/flight-jdbc-driver/src/test/java/org/apache/arrow/driver/jdbc/accessor/impl/calendar/ArrowFlightJdbcTimeStampVectorAccessorTest.java" This reverts commit 00808c0. * Revert "Merge pull request apache#29 from rafael-telles/Timestamp_fix" This reverts commit 7924e7b, reversing changes made to f6ac593. * Fix DateTime for negative epoch * Remove unwanted change * Fix negative timestamp shift * Fix coverage * Refator DateTimeUtilsTest

julienledem reviewed Aug 8, 2016
View reviewed changes

kou and others added 29 commits August 16, 2017 09:17

ARROW-1356: [Website] Add new committers

c2fb9cb

Author: Kouhei Sutou <kou@clear-code.com> Closes apache#968 from kou/add-new-committers and squashes the following commits: 710558b [Kouhei Sutou] [Website] Add new committers

[C++] DOC: Fix a typo in plasma.md

4471dc9

Closes apache#970 Change-Id: I49ea3f7f99d080c517fb21b86b7a27e17b04e20b

[Python] DOC: Fix Parquet docs to use pyarrow.parquet namespace for w…

c0fa8e0

…rite_table Closes apache#971 Change-Id: I7c689b200a4f04af51928f6765362fef52c613e8

[C++] Fix a typo in in plasma.md

e1bad9f

Closes apache#977 Change-Id: I494db4952036a8e52078f1d698d003904f91a34f

ARROW-1375: [C++] Remove dependency on msvc version for Snappy build

6ad976e

Author: Max Risuhin <risuhin.max@gmail.com> Closes apache#980 from MaxRis/ARROW-1375 and squashes the following commits: f5e4156 [Max Risuhin] ARROW-1375: [C++] Remove dependency on msvc version for Snappy build

ARROW-1395: [C++/Python] Remove APIs deprecated from 0.5.0 onward

5303594

Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#983 from wesm/ARROW-1395 and squashes the following commits: c105a21 [Wes McKinney] Remove deprecated APIs from <= 0.4.0

ARROW-1411: [Python] Booleans in Float Columns cause Segfault

b36aab5

Author: Phillip Cloud <cpcloud@gmail.com> Closes apache#993 from cpcloud/ARROW-1411 and squashes the following commits: 741269f [Phillip Cloud] ARROW-1411: [Python] Booleans in Float Columns cause Segfault

ARROW-1414: [GLib] Cast after status check

32e2668

Author: Kouhei Sutou <kou@clear-code.com> Closes apache#996 from kou/glib-cast-after-status-check and squashes the following commits: 02b59db [Kouhei Sutou] [GLib] Cast after status check

Licht-T and others added 5 commits October 24, 2017 12:41

ARROW-1588: [C++/Format] Harden Decimal Format

b2596f6

Author: Phillip Cloud <cpcloud@gmail.com> Closes apache#1211 from cpcloud/ARROW-1588 and squashes the following commits: ae0d562 [Phillip Cloud] ARROW-1588: [C++/Format] Harden Decimal Format

ARROW-1726: [GLib] Add setup description to verify C GLib build

8148b6d

Author: Kouhei Sutou <kou@clear-code.com> Closes apache#1247 from kou/c-glib-release-verify and squashes the following commits: e9f2307 [Kouhei Sutou] [GLib] Add setup description to verify C GLib build

bigdata-memory force-pushed the master branch 2 times, most recently from 752242f to e699cb5 Compare October 25, 2017 22:22

bigdata-memory force-pushed the master branch from e699cb5 to 9e59d4e Compare October 25, 2017 23:13

added Mnemonic infra. as an alternative backed allocation mechanism, …

c03c4d2

…note that move allocator services to the service-dist folder as the properties indicated in pom.xml.

bigdata-memory force-pushed the master branch from 9e59d4e to c03c4d2 Compare October 26, 2017 04:31

wesm changed the title ~~added Mnemonic infra. as an alternative backed allocation mechanism, …~~ ARROW-1760: [Java] Add Apache Mnemonic (incubating) as alternative backed allocator Nov 1, 2017

wesm closed this Nov 4, 2018

guyuqi mentioned this pull request Nov 22, 2018

ARROW-3849: [C++] Leverage Armv8 crc32 extension instructions to accelerate the hash computation for Arm64 #3010

Closed

jikunshang pushed a commit to jikunshang/arrow that referenced this pull request May 6, 2020

Merge pull request apache#36 from jikunshang/rebase_oap_master

051c679

fix SendCreateRequest, miss a parameter

zhztheplayer pushed a commit to zhztheplayer/arrow-1 that referenced this pull request Nov 9, 2021

BackPort [ARROW-13572]: ORC support , [ARROW-13797]: column projectio…

a2c70e8

…n pushdow, and refactor some classes (apache#36)

paleolimbot mentioned this pull request Jan 28, 2023

[R] Crash on MacOS (x86) when running tests with homebrew apache-arrow also installed #33903

Closed

prniii mentioned this pull request Jan 12, 2024

[Python] macOS Segfault on Import, Both arm64 and x86_64 #37010

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-1760: [Java] Add Apache Mnemonic (incubating) as alternative backed allocator #36

ARROW-1760: [Java] Add Apache Mnemonic (incubating) as alternative backed allocator #36

bigdata-memory commented Mar 24, 2016

julienledem Aug 8, 2016

bigdata-memory Oct 23, 2017

bigdata-memory Oct 30, 2017

wesm commented Oct 23, 2017

bigdata-memory commented Oct 23, 2017

bigdata-memory commented Oct 25, 2017

bigdata-memory commented Oct 25, 2017

bigdata-memory commented Oct 26, 2017

wesm commented Nov 1, 2017

jacques-n commented Nov 2, 2017

bigdata-memory commented Nov 3, 2017

bigdata-memory commented Nov 3, 2017

wesm commented Nov 4, 2018

ARROW-1760: [Java] Add Apache Mnemonic (incubating) as alternative backed allocator #36

ARROW-1760: [Java] Add Apache Mnemonic (incubating) as alternative backed allocator #36

Conversation

bigdata-memory commented Mar 24, 2016

julienledem Aug 8, 2016

Choose a reason for hiding this comment

bigdata-memory Oct 23, 2017

Choose a reason for hiding this comment

bigdata-memory Oct 30, 2017

Choose a reason for hiding this comment

wesm commented Oct 23, 2017

bigdata-memory commented Oct 23, 2017

bigdata-memory commented Oct 25, 2017

bigdata-memory commented Oct 25, 2017

bigdata-memory commented Oct 26, 2017

wesm commented Nov 1, 2017

jacques-n commented Nov 2, 2017

bigdata-memory commented Nov 3, 2017

bigdata-memory commented Nov 3, 2017

wesm commented Nov 4, 2018